Reproducibility: Now you can, too! Reproducibility: Now you can, too!

Nicholas Tierney

25 October 2016

The Situation

The Data

Despite “helpful pre-aggregation”, you managed to put out the fire unaggregate it.

There only about 500 observations

You didn’t feel too bad about combing through 500 rows.

You’d only need to do it once

… right?

The paper

You wrote your document in Word

That’s what your other collaborator used.

The analysis

Submit and celebrate

It’s a short, well written paper.

You are really happy with it.

You email it to your collaborators

You get a burger to celebrate.

image from http://aht.seriouseats.com/images/2013/09/20130917-266522-one-eared-stag-meatstick.jpg

Notification

Collaborator:

Wow, great job!

I’ve just managed to get ethical approval to host the data online, and we also have another 10 schools to add …

so we’ve got another 1000 data points!

Do you think you could just redo the plots and the tables and see if anything changes?

… That’s … annoying

You start to imagine …

All of the code

Building the models again

Copying the tables into word

Inserting the figures

Notification

Sorry, forgot to attach the data.

It’s in the same format as last time, that is fine, right?

Didn’t seem to cause you a problem last time! :)

Blinding Rage

You lose your mind and go into a murderous rage, letting loose on the students in the university armed with nothing but a steak knife.

Wait, What?

So let’s pause for a minute before I take this story further.

It’s 2016.

Is there a better way?

The answer is yes.

The answer, is rmarkdown.

Go back in time

Set the wayback machine for 3 months ago.

{ demo }

Pause

Way Back, rewind.

OK. So you’re back in the cafeteria, holding a steak knife

Your collaborator adds another 1000 rows to your dataset.

You don’t go into a murderous rage.

You take 10 minutes. You eyeball the data. Check it reads in OK.

You knit the document together.

You get changes in the tables and plots.

It’s all in one document. It’s rmarkdown. It’s great. You send the paper through to your collaborator. You go to eat another burger but it’s only been 1 hour so you just make a cup of tea instead.

Submit and Publish

Share and Promote

You put all the code and data onto this thing called GitHub, where people can access it. You post about your accomplishment on twitter, and then you forget about it.

Enter

Someone reads your paper.

They like what they see.

They get the code and the data from github

They reproduce your results.

Then they add a bit more - they try out this new statistical model on your data.

Reproduction, Publication.

They get some extra insights.

Glean some new information.

They publish a paper based on your research.

Publication, Celebration

You get cited.

You celebrate, you eat a burger.

This whole process, it is reproducible - you can do it too.

It’s free.

Also, it’s free. I think I forgot to mention that.

References